Skip to content

feat: add OpenRouter as LLM and embedding provider#56

Merged
RaghavChamadiya merged 5 commits intorepowise-dev:mainfrom
oglenyaboss:feat/openrouter-provider
Apr 26, 2026
Merged

feat: add OpenRouter as LLM and embedding provider#56
RaghavChamadiya merged 5 commits intorepowise-dev:mainfrom
oglenyaboss:feat/openrouter-provider

Conversation

@oglenyaboss
Copy link
Copy Markdown
Contributor

@oglenyaboss oglenyaboss commented Apr 8, 2026

Summary

  • Add OpenRouter as a first-class LLM and embedding provider, enabling access to 200+ models through a single OPENROUTER_API_KEY
  • Both repowise init (doc generation) and repowise serve (chat/search) work with OpenRouter out of the box
  • No new pip dependency — uses the existing openai package (OpenAI-compatible API)

LLM Provider

  • OpenRouterProvider with generate() + stream_chat() support
  • Default model: anthropic/claude-sonnet-4.6
  • Sets recommended HTTP-Referer and X-Title headers for OpenRouter dashboard tracking
  • Rate limits: 60 RPM / 200K TPM (conservative defaults, user-overridable)

Embedding Provider

  • OpenRouterEmbedder for semantic search and chat RAG
  • Default model: google/gemini-embedding-001 (768 dims)
  • One API key covers both LLM and embeddings

Integration Points

  • Registered in both LLM and embedding registries (lazy import)
  • CLI auto-detection from OPENROUTER_API_KEY env var
  • Interactive provider/embedder selection in repowise init and repowise serve
  • Server provider catalog updated for web UI

Known Limitations

  • Cost tracking disabled for OpenRouter: since it proxies 200+ models with varying prices, the fallback pricing would show inflated numbers. Users should check the OpenRouter dashboard for actual costs. A future PR could fetch real pricing via the /api/v1/models endpoint.
  • Chat and embeddings not fully tested — LLM generation was tested end-to-end on a real 800+ file project using qwen/qwen3.6-plus via OpenRouter. Chat and embedding functionality was not integration-tested but follows the same OpenAI-compatible patterns as the existing OpenAI provider.

Test plan

  • 13 unit tests for OpenRouterProvider (construction, generation, error mapping, headers)
  • Registry test updated (builtin count 6 → 7, openrouter in list)
  • Integration test added (skipped without OPENROUTER_API_KEY)
  • Manual end-to-end test: repowise init with OpenRouter on a real project (328 wiki pages generated successfully)
  • Chat via web UI with OpenRouter as active provider
  • Embedding search with OpenRouterEmbedder

Add OpenRouter as a first-class provider, enabling access to 200+ models
(Claude, GPT, Gemini, Llama, Qwen, etc.) through a single API key.

LLM provider:
- New OpenRouterProvider using OpenAI-compatible endpoint
- Supports generate() and stream_chat() (ChatProvider protocol)
- Sets recommended HTTP-Referer and X-Title headers
- Default model: anthropic/claude-sonnet-4.6
- Rate limits: 60 RPM / 200K TPM
- Cost tracking intentionally disabled (OpenRouter proxies models
  with varying prices — users should check the OpenRouter dashboard)

Embedding provider:
- New OpenRouterEmbedder for vector search and chat
- Default model: google/gemini-embedding-001 (768 dims)
- One OPENROUTER_API_KEY covers both LLM and embeddings

Integration:
- Registered in LLM and embedding registries (lazy import)
- CLI auto-detection from OPENROUTER_API_KEY env var
- Interactive provider selection in `repowise init`
- Embedder selection in `repowise serve`
- Server provider catalog for web UI
- No new pip dependency (uses existing openai package)

Tests:
- 13 unit tests (construction, generation, error mapping, headers)
- Registry test updated (builtin count 6 → 7)
- Integration test (skipped without OPENROUTER_API_KEY)
@RaghavChamadiya
Copy link
Copy Markdown
Collaborator

This is really thorough, probably the most complete provider PR we've gotten. LLM + embedder + CLI + server catalog all done, and the honest known limitations section is appreciated.

One thing to fix before merge:

  1. cost_tracker dead code (blocking): the constructor accepts cost_tracker but hardcodes it to None, and then there's an if self._cost_tracker block in generate() that can never execute. Either remove the parameter entirely or add a log warning that cost tracking is disabled for OpenRouter. Just don't want dead code sitting there confusing people.

Also flagging for awareness (not blocking):

  1. stream_chat duplication: the streaming logic is identical to OpenAI's, about 95 lines copy-pasted. Would be nice to extract a shared base eventually, but not blocking this PR on it.

  2. Tests: no coverage for stream_chat or OpenRouterEmbedder. Even basic construction + mock tests would help, but also not blocking.

Should be a quick fix, happy to merge after (1).

Remove the cost_tracker parameter and unreachable if-block from
generate(). OpenRouter proxies 200+ models with varying prices,
so cost tracking is documented as unsupported — users should check
the OpenRouter dashboard instead.
- stream_chat: text deltas, tool calls, rate limit error (3 tests)
- OpenRouterEmbedder: construction, dimensions, embedding, base URL (12 tests)
@oglenyaboss
Copy link
Copy Markdown
Contributor Author

Thanks for the review!

  1. cost_tracker dead code — fixed in c1c1773. Removed the cost_tracker parameter and the unreachable if self._cost_tracker block entirely. The constructor now accepts **_kwargs so it won't break if the pipeline passes cost_tracker= at instantiation time.

  2. stream_chat duplication — agreed, the shared base extraction makes sense. Happy to tackle that in a follow-up PR if you'd like.

  3. Tests — added in 57d40c7:

  • stream_chat: text deltas, tool calls, and rate limit error (3 tests)
  • OpenRouterEmbedder: construction, dimensions, embedding, and base URL verification (12 tests)

Copy link
Copy Markdown
Collaborator

@swati510 swati510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean provider addition, follows the existing shape well. A few things worth looking at before merge:

  1. Silent dimension corruption risk in OpenRouterEmbedder: _DIMS falls back to 768 for any model not in the dict, but the actual model might be 1024 or 3072 dims. If a user picks e.g. cohere/embed-english-v3, they'll get 768-dim stored vectors that don't match the model's real output, vector store silently corrupted. Safer to raise ValueError("unknown embedding model %s, add to _DIMS") instead of falling back.

  2. OpenAI param compat: you're using max_completion_tokens, which is the newer OpenAI SDK name. OpenRouter proxies 200+ models and not all of them accept that param (some only take max_tokens). Worth smoke-testing against a few of the listed defaults (anthropic/claude-sonnet-4.6, google/gemini-, meta-llama/) to confirm it works, and falling back to max_tokens if not.

  3. The OpenRouterProvider constructor accepts **_kwargs: Any and silently drops them. If the registry later starts passing e.g. cost_tracker=..., rate_limiter=..., or tier=... (now that minimax/zai use tier), those will just vanish. Drop the **_kwargs catchall or convert it to explicit params like every other provider.

Embedder dimensions fallback is the one that could bite users badly, the other two are mostly hygiene.


@property
def dimensions(self) -> int:
return self._DIMS.get(self._model, 768)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent 768-dim fallback for unknown models is a footgun. If a user picks a model with different real dimensions (e.g. 1024), stored vectors won't match the model output and the vector store is corrupted without any error. Raise ValueError here instead, force users to add new models to _DIMS explicitly.

rate_limiter: RateLimiter | None = None,
http_referer: str | None = None,
app_title: str = "repowise",
**_kwargs: Any,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently swallowing unknown kwargs means any future change to how the registry constructs providers (e.g. passing cost_tracker, rate_limiter, tier) will fail silently for this provider. Drop the catchall and declare the params you accept explicitly.

- Embedder: raise ValueError in __init__ for unknown models instead of
  silently falling back to 768 dims, which would mis-size stored vectors
  against the model's real output and corrupt the vector store.
- Provider: drop **_kwargs catchall and accept cost_tracker explicitly so
  unknown kwargs from future registry changes fail loudly instead of
  vanishing.
- Provider: switch max_completion_tokens → max_tokens in generate() and
  stream_chat(). Per OpenRouter API docs, max_tokens is the universal
  parameter across the 200+ proxied models; max_completion_tokens is an
  OpenAI-specific newer name not all proxied models accept.
@oglenyaboss
Copy link
Copy Markdown
Contributor Author

Thanks for the catches! Fixed all three in 1f9f67a:

1. Silent dimension fallback → ValueError. Moved the check to __init__ rather than the dimensions property — fails at construction time so misconfig surfaces at startup instead of at the first vector-store sizing call. Error message names the bad model and lists the known ones so the fix is obvious.

2. **_kwargs catchall → explicit cost_tracker param. Dropped the catchall and added cost_tracker: CostTracker | None = None to match the other providers' signatures. Accepted but unused (still no cost tracking for OpenRouter — docstring updated to say so). Any future kwarg from the registry will now blow up loudly instead of vanishing. Added a test_rejects_unknown_kwargs to lock that in.

3. max_completion_tokensmax_tokens. Verified via OpenRouter's API docs that max_tokens is the standard parameter — it's what's in their main Request schema and all their Python/Swift examples. max_completion_tokens is an OpenAI-specific newer name. Switched both generate() and stream_chat().

Tests updated: replaced the unknown-model-defaults test with a construction-failure test, switched the kwargs assertion to max_tokens, and added the cost_tracker positive + unknown-kwarg negative tests. 104/104 provider+persistence tests pass.

Copy link
Copy Markdown
Collaborator

@RaghavChamadiya RaghavChamadiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid provider addition that follows the existing shape. The follow-up commits address all three of the earlier review points (embedder dimension fail-fast, max_tokens compat for non-OpenAI models behind OpenRouter, removal of the **_kwargs catchall). Tests are thorough.

@RaghavChamadiya RaghavChamadiya merged commit b6b7e97 into repowise-dev:main Apr 26, 2026
5 checks passed
@RaghavChamadiya RaghavChamadiya mentioned this pull request Apr 26, 2026
RaghavChamadiya added a commit that referenced this pull request Apr 26, 2026
* feat: improve PreToolUse hook relevance with multi-signal search

Replace FTS-only file retrieval with a 3-signal ranking system:
- Symbol name match (weight 2.0) — most precise
- File path match (weight 1.5) — catches path-based searches
- FTS on wiki content (weight 1.0) — broadest, lowest priority
Files ranked by signal score then PageRank, top 3 returned.

Remove git signals (HOTSPOT, bus-factor, owner) from enrichment —
that info belongs in get_risk, not every search. Remove Bash command
interception (fragile regex on grep/rg commands).

Keep: symbols (3), importers (3), dependencies (2) per file.

* release: v0.3.1

Bumps repowise to 0.3.1 across pyproject.toml and the three sub-package
__init__.py files.

Highlights since 0.3.0:

- Output language support for generated wiki content (#99)
- Luau / Roblox language support (#89)
- OpenRouter LLM and embedding provider (#56)
- base_url plus per-provider env vars for OpenAI / Anthropic / Gemini /
  Ollama / LiteLLM (#85)
- SQLite WAL plus busy_timeout plus FK constraints, fixing concurrent
  'repowise update' database is locked errors (#101)
- CLAUDE.md opt-out prompt now asked in both full and advanced modes
  and the answer is honoured (#102)
- repowise init no longer silently overwrites unparseable user JSON
  configs (#94)
- pyproject packages list resynced with the language-support refactor
  so editable installs and CI build cleanly (#97)
- uv workflow documented and dev deps migrated to PEP 735
  dependency-groups, silencing the tool.uv.dev-dependencies deprecation
  warning (#100)
- Five Dependabot security bumps (dompurify, gitpython, mako, litellm,
  python-multipart)

Also flips the project URLs and serve_cmd's _GITHUB_REPO constant from
RaghavChamadiya/repowise to repowise-dev/repowise so 'repowise serve'
can locate the published web UI tarball.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants